Pseudometrics for State Aggregation in Average Reward Markov Decision Processes

نویسنده

  • Ronald Ortner
چکیده

We consider how state similarity in average reward Markov decision processes (MDPs) may be described by pseudometrics. Introducing the notion of adequate pseudometrics which are well adapted to the structure of the MDP, we show how these may be used for state aggregation. Upper bounds on the loss that may be caused by working on the aggregated instead of the original MDP are given and compared to the bounds that have been achieved for discounted reward MDPs.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Adaptive aggregation for reinforcement learning in average reward Markov decision processes

We present an algorithm which aggregates online when learning to behave optimally in an average reward Markov decision process. The algorithm is based on the reinforcement learning algorithm UCRL and uses confidence intervals for aggregating the state space. We derive bounds on the regret our algorithm suffers with respect to an optimal policy. These bounds are only slightly worse than the orig...

متن کامل

A State Aggregation Approach to Singularly Perturbed Markov Reward Processes

In this paper, we propose a single sample path based algorithm with state aggregation to optimize the average rewards of singularly perturbed Markov reward processes (SPMRPs) with a large scale state spaces. It is assumed that such a reward process depend on a set of parameters. Differing from the other kinds of Markov chain, SPMRPs have their own hierarchical structure. Based on this special s...

متن کامل

Semi-markov Decision Processes

Considered are infinite horizon semi-Markov decision processes (SMDPs) with finite state and action spaces. Total expected discounted reward and long-run average expected reward optimality criteria are reviewed. Solution methodology for each criterion is given, constraints and variance sensitivity are also discussed.

متن کامل

The BisimDist Library: Efficient Computation of Bisimilarity Distances for Markovian Models

This paper presents a library for exactly computing the bisimilarity Kantorovich-based pseudometrics between Markov chains and between Markov decision processes. These are distances that measure the behavioral discrepancies between non-bisimilar systems. They are computed by using an on-the-fly greedy strategy that prevents the exhaustive state space exploration and does not require a complete ...

متن کامل

Combinations and Mixtures of Optimal Policies in Unichain Markov Decision Processes are Optimal

We show that combinations of optimal (stationary) policies in unichain Markov decision processes are optimal. That is, let M be a unichain Markov decision process with state space S, action space A and policies π◦ j : S → A (1 ≤ j ≤ n) with optimal average infinite horizon reward. Then any combination π of these policies, where for each state i ∈ S there is a j such that π(i) = π◦ j (i), is opt...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2007